GPU Architecture

A warp contains multiple thread processors (typically 32/64). All processors in a warp run the same code simultaneously.

Each core has some memory allocated for both L1 cache and shared memory. Each core contains 4 processing blocks (which can run a warp each).

A dispatched workgroup may run on multiple warps.

All cores share an L2 cache.

https://www.youtube.com/watch?v=whPSD8sdx-0

Created 6/21/2025
Tended
  • 6/21/2025